Approximate Policy Iteration Schemes: A Comparison
نویسنده
چکیده
We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration (API) (Bertsekas & Tsitsiklis, 1996), Conservative Policy Iteration (CPI) (Kakade & Langford, 2002), a natural adaptation of the Policy Search by Dynamic Programming algorithm (Bagnell et al., 2003) to the infinite-horizon case (PSDP∞), and the recently proposed Non-Stationary Policy Iteration (NSPI(m)) (Scherrer & Lesner, 2012). For all algorithms, we describe performance bounds with respect the per-iteration error , and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API, but this comes at the cost of a relative—exponential in 1 —increase of the number of iterations. 2) PSDP∞ enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP∞ is proportional to their number of iterations, which may be problematic when the discount factor γ is close to 1 or the approximation error is close to 0; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis. Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s).
منابع مشابه
A Class of Nested Iteration Schemes for Generalized Coupled Sylvester Matrix Equation
Global Krylov subspace methods are the most efficient and robust methods to solve generalized coupled Sylvester matrix equation. In this paper, we propose the nested splitting conjugate gradient process for solving this equation. This method has inner and outer iterations, which employs the generalized conjugate gradient method as an inner iteration to approximate each outer iterate, while each...
متن کاملAlgorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem, have been proposed recently. Finding good policies with such methods requires not only an appropriate classifier, but also reliable examples of best actions, covering the state space sufficiently. Up to this ti...
متن کاملAlgorithms and Bounds for Sampling-based Approximate Policy Iteration *
Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem, have been proposed recently. Finding good policies with such methods requires not only an appropriate classifier, but also reliable examples for the best actions, covering all of the state space. One major ques...
متن کاملThe comparison of two high-order semi-discrete central schemes for solving hyperbolic conservation laws
This work presents two high-order, semi-discrete, central-upwind schemes for computing approximate solutions of 1D systems of conservation laws. We propose a central weighted essentially non-oscillatory (CWENO) reconstruction, also we apply a fourth-order reconstruction proposed by Peer et al., and afterwards, we combine these reconstructions with a semi-discrete central-upwind numerical flux ...
متن کاملNumerical solution of the system of Volterra integral equations of the first kind
This paper presents a comparison between variational iteration method (VIM) and modfied variational iteration method (MVIM) for approximate solution a system of Volterra integral equation of the first kind. We convert a system of Volterra integral equations to a system of Volterra integro-di®erential equations that use VIM and MVIM to approximate solution of this system and hence obtain an appr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014